Discovering Linguistic Patterns Using Sequence Mining
نویسندگان
چکیده
In this paper, we present a method based on data mining techniques to automatically discover linguistic patterns matching appositive qualifying phrases. We develop an algorithm mining sequential patterns made of itemsets with gap and linguistic constraints. The itemsets allow several kinds of information to be associated with one term. The advantage is the extraction of linguistic patterns with more expressiveness than the usual sequential patterns. In addition, the constraints enable to automatically prune irrelevant patterns. In order to manage the set of generated patterns, we propose a solution based on a partial ordering. A human user can thus easily validate them as relevant linguistic patterns. We illustrate the efficiency of our approach over two corpora coming from a newspaper.
منابع مشابه
Finding Sequential Patterns from Large Sequence Data
Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence da...
متن کاملMINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS
This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...
متن کاملSequential Data Mining for Information Extraction from Texts
This paper shows the benefit of using data mining methods for Biological Natural Language Processing. A method for discovering linguistic patterns based on a recursive sequential pattern mining is proposed. It does not require a sentence parsing nor other resource except a training data set. It produces understandable results and we show its interest in the extraction of relations between named...
متن کاملA Less Cumulative Algorithm of Mining Linguistic Browsing Patterns in the World Wide Web
Finding sequential patterns is one of important issues in data mining. This paper deals with linguistic (fuzzy) sequential patterns. The existing algorithms for discovering such patterns do involve usual sigma counts of fuzzy sets as measure of support. Unfortunately, a well-known side effect is then an undesirable cumulation of small membership values. We like to propose an improved approach b...
متن کاملتحلیل تراکنشهای امانت و گردش منابع کتابخانههای دانشگاه علوم پزشکی بیرجند با الگوریتمهای دادهکاوی
Introduction: Data mining is a process for discovering meaningful relationships and patterns from data. Identify behavior patterns of libraries users can helps improve decision-making in libraries. This study aimed to analyze the interlibrary loan transactions in Birjand University of Medical Sciences using data mining algorithms. Methods: In this descriptive study, knowledge discovery and d...
متن کامل